Rapid development of RBMT systems for related languages
نویسنده
چکیده
The article describes a new way of constructing rule-based machine translation systems (RBMT). RBMT systems are currently among the best performing machine translation systems. Most of the "big named" machine translation systems (Systran, 2007)(Promt, 2007) belong to this category, but these systems have a big drawback; construction of such systems demands a great amount of time and resources, thus resulting very expensive. The article describes methods that automate parts of the construction process. The methods were evaluated on a case study: construction of a fully functional machine translation system of closely related language pair Slovene Serb. Slovene and Serbian language belong to the group of southern Slavic languages that were spoken mostly in former Yugoslavia. Slovenian language is mostly spoken in Slovenia, Serbian language is mostly spoken in Serbia and in Montenegro. The languages share common roots and even more importantly they share common recent historical environment, these languages were spoken in the same country, even taught in schools as languages of the surroundings. Economies of all three states are closely connected and younger generations, the post-yugoslavia breakage generations, have difficulties in mutual communication, so there is quite big interest in construction of such translation system. The system is based on Apertium (Armentano-Oller et al., 2007) (Corbí-Bellot et al., 2005), an opensource RBMT toolkit. Apertium uses a shallow-transfer machine translation engine which processes the input text in stages, as in an assembly line: de-formatting, morphological analysis, part-of-speech disambiguation, shallow structural transfer, lexical transfer, morphological generation, and re-formatting. The data needed by the presented stages can be grouped into three categories: monolingual dictionaries used by morphological analysis and morphological generation, bilingual dictionaries used in lexical transfer and structural transfer rules used in structural transfer. Each group's data creation was addressed by a particular method; monolingual dictionaries were constructed using bilingual dictionary data and applying automatic paradigm tagging techniques; bilingual dictionary was constructed using available bilingual word-list but a few methods for automatic bilingual dictionary construction were investigated; a method for automatic structural shallow-transfer rule construction (Sánchez-Martínez et al., 2006) was used to construct a set of structural transfer rules.
منابع مشابه
OpenMT: Open Source Machine Translation Using Hybrid Methods
The main goal of the OpenMT project is the development of open source machine translation architectures based on hybrid models and advanced syntactic–semantic processors. These architectures combine the three main Machine Translation (MT) frameworks, Rule-based (RBMT), Statistical (SMT) and Example–based (EBMT), into hybrid systems. Defined architectures and results will be open source, allow f...
متن کاملČesílko Goes Open-source
TheMachine Translation systemČesílko has beendeveloped as an answer to a growingneed of translation and localization from one source language to many target languages. The system belongs to the shallow parse, shallow transfer RBMT paradigm and it is designed primarily for translation of related languages. The paper presents the architecture, the development design and the basic installation ins...
متن کاملDeveloping Prototypes for Machine Translation between Two Sámi Languages
This paper describes the development of two prototype systems for machine translation between North Sámi and Lule Sámi. Experiments were conducted in rule-based machine translation (RBMT), using the Apertium platform, and statistical machine translation (SMT) using the Mosesdecoder. The experiments show that both approaches have their advantages and disadvantages, and that they can both make us...
متن کاملA Hybrid Machine Translation System Based on a Monotone Decoder
In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...
متن کاملA case study on English-Malayalam Machine Translation
In this paper we present our work on a case study on Statistical Machine Translation (SMT) and Rule based machine translation (RBMT) for translation from English to Malayalam and Malayalam to English. One of the motivations of our study is to make a three way performance comparison, such as, a) SMT and RBMT b) English to Malayalam SMT and Malayalam to English SMT c) English to Malayalam RBMT an...
متن کامل